Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning

نویسندگان

  • Chaitanya Chemudugunta
  • America Holloway
  • Padhraic Smyth
  • Mark Steyvers
چکیده

Human-defined concepts are fundamental building-blocks in constructing knowledge bases such as ontologies. Statistical learning techniques provide an alternative automated approach to concept definition, driven by data rather than prior knowledge. In this paper we propose a probabilistic modeling framework that combines both human-defined concepts and data-driven topics in a principled manner. The methodology we propose is based on applications of statistical topic models (also known as latent Dirichlet allocation models). We demonstrate the utility of this general framework in two ways. We first illustrate how the methodology can be used to automatically tag Web pages with concepts from a known set of concepts without any need for labeled documents. We then perform a series of experiments that quantify how combining human-defined semantic knowledge with data-driven techniques leads to better language models than can be obtained with either alone.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Quantification et identification des concepts implicites d'une requête

In this paper we introduce an unsupervised method for mining and modeling latent search concepts. We use Latent Dirichlet Allocation (LDA), a generative probabilistic topic model, to exhibit highly-specific query-related topics from pseudo-relevant feedback documents. Our approach automatically estimates the number of latent concepts as well as the needed amount of feedback documents, without a...

متن کامل

An Unsupervised Model for Rule-Based Similarity Learning from Textual Data

This paper presents a preliminary research on construction of a new unsupervised model for learning a semantic similarity measure from text corpora. Two main components of the model are a semantic interpreter of texts and a rule-based similarity function. The first one associates particular documents with concepts defined in a knowledge base which corresponds to the topics covered by the corpus...

متن کامل

Learning Concept Hierarchies through Probabilistic Topic Modeling

With the advent of semantic web, various tools and techniques have been introduced for presenting and organizing knowledge. Concept hierarchies are one such technique which gained significant attention due to its usefulness in creating domain ontologies that are considered as an integral part of semantic web. Automated concept hierarchy learning algorithms focus on extracting relevant concepts ...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

Presentation of an efficient automatic short answer grading model based on combination of pseudo relevance feedback and semantic relatedness measures

Automatic short answer grading (ASAG) is the automated process of assessing answers based on natural language using computation methods and machine learning algorithms. Development of large-scale smart education systems on one hand and the importance of assessment as a key factor in the learning process and its confronted challenges, on the other hand, have significantly increased the need for ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008